Supplemental Material Online Inference for the Infinite Topic-Cluster Model: Storylines from Streaming Text

نویسندگان

  • Amr Ahmed
  • Qirong Ho
  • Choon Hui Teo
  • Jacob Eisenstein
  • Alex J. Smola
  • Eric P. Xing
چکیده

We use C to denote a generic count of co-occurrences, for example, Ctdk is the number of words in document d at epoch t that are generated from topic k. We might remove a dimension to denote summation, for example, Ctd. is the total number of words in document d at epoch t and Ct.k is the total number of words generated from topic k at epoch t. Finally, we use a negative sign in the superscript to denote exclusion, for example, C−tdi tdk is the same quantity as Ctdk without the contribution of word i, although sometimes we abuse notation and use i if the meaning is clear from the context.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Inference for the Infinite Topic-Cluster Model: Storylines from Streaming Text

We present the time-dependent topic-cluster model, a hierarchical approach for combining Latent Dirichlet Allocation and clustering via the Recurrent Chinese Restaurant Process. It inherits the advantages of both of its constituents, namely interpretability and concise representation. We show how it can be applied to streaming collections of objects such as real world feeds in a news portal. We...

متن کامل

Online Latent Dirichlet Allocation with Infinite Vocabulary

Topic models based on latent Dirichlet allocation (LDA) assume a predefined vocabulary. This is reasonable in batch settings but not reasonable for streaming and online settings. To address this lacuna, we extend LDA by drawing topics from a Dirichlet process whose base distribution is a distribution over all strings rather than from a finite Dirichlet. We develop inference using online variati...

متن کامل

Online Variational Inference for the Hierarchical Dirichlet Process

The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric model that can be used to model mixed-membership data with a potentially infinite number of components. It has been applied widely in probabilistic topic modeling, where the data are documents and the components are distributions of terms that reflect recurring patterns (or “topics”) in the collection. Given a document collect...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Nonparametric Bayesian Storyline Detection from Microtexts

News events and social media are composed of evolving storylines, which capture public attention for a limited period of time. Identifying storylines requires integrating temporal and linguistic information, and prior work takes a largely heuristic approach. We present a novel online non-parametric Bayesian framework for storyline detection, using the distance-dependent Chinese Restaurant Proce...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011